Search CORE

15 research outputs found

NP Animacy Identification for Anaphora Resolution

Author: Evans R. J.
Orasan C.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2007
Field of study

In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which uses information about the unique beginners in WordNet to classify NPs on the basis of their animacy. The second method relies on a machine learning algorithm which exploits a WordNet enriched with animacy information for each sense. The effect of word sense disambiguation on the two methods is also assessed. The intrinsic evaluation reveals that the machine learning method reaches human levels of performance. The extrinsic evaluation demonstrates that animacy identification can be beneficial in anaphora resolution, especially in the cases where animate entities are identified with high precision

arXiv.org e-Print Archive

Crossref

Wolverhampton Intellectual Repository and E-theses

The evaluation of liver fibrosis regression in chronic hepatitis C patients after the treatment with direct-acting antiviral agents – A review of the literature

Author: Alexescu Teodora G
Cibu Bianca A
Ciulei George
Coste Sorina C
Cozma Angela
Mureșan Flaviu
Negrean Vasile
Orasan Olga H
Ovidiu Fabian
Pfingstgraf Iulia O
Popovici Ionela E
Tarmure Simina F
Taut Adela V. S
Publication venue: ValpoScholar
Publication date: 09/10/2019
Field of study

The second-generation of direct-acting antiviral agents are the current treatment for chronic viral hepatitis C infection. To evaluate the regression of liver fibrosis in patients receiving this therapy, liver biopsy remains the most accurate method, but the invasiveness of this procedure is its major drawback. Different non-invasive tests have been used to study changes in the stage of liver fibrosis in patients with chronic viral hepatitis treated with the second-generation of direct-acting antiviral agents: liver stiffness measurements (with transient elastography or acoustic radiation force impulse elastography) or different scores that use serum markers to calculate a fibrosis score. We prepared a literature review of the available data regarding the long-term evolution of liver fibrosis after the treatment with direct-acting antiviral agents for chronic viral hepatitis C

Valparaiso University

CANELC: constructing an e-language corpus

Author: Allan S.
Baron A.
Baron N.
Bates E.
Baym N.
Biber D.
Borau K.
Boyd D.
Carter R.A.
Chafe W.L.
Condon S.L.
Crystal D.
Crystal D.
Crystal D.
Danet B.
Dawn Knight
Halliday M.A.K.
Herring S.C.
Honeycutt C.
Horn C.
Hymes D.
Jones Q.
Klimt B.
Ko K.
Labov W.
Myers B.
Myers G.
Orasan C.
Puschmann C.
Rheingold H.
Ronald Carter
Shortis T.
Sutherland J.
Svenja Adolphs
Thurlow C.
Wilson A.
Zappavingna M.
Publication venue: 'Edinburgh University Press'
Publication date: 01/01/2014
Field of study

This paper reports on the construction of CANELC: the Cambridge and Nottingham e-language Corpus.3 CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC - British National Corpus). 3 CANELC stands for Cambridge and Nottingham e-language Corpus. This corpus has been built as part of a collaborative project between The University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, tweets, discussion board content and private/business emails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At the present time the annotated corpus is only available to authors and researchers working for CUP and is not more generally available

Nottingham ePrints

Nottingham eTheses

Crossref

Online Research @ Cardiff

Repository@Nottingham

Measuring text simplification with the crowd

Author: Action Plain Language
Biran O.
Bott S.
Carroll J.
Clercq O. De
Feng L.
Gunning R.
Kincaid J. P.
Orasan C.
Yatskar M.
Štajner S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/12/2015
Field of study

Text can often be complex and difficult to read, especially for peo ple with cognitive impairments or low literacy skills. Text simplifi cation is a process that reduces the complexity of both wording and structure in a sentence, while retaining its meaning. However, this is currently a challenging task for machines, and thus, providing effective on-demand text simplification to those who need it re mains an unsolved problem. Even evaluating the simplicity of text remains a challenging problem for both computers, which cannot understand the meaning of text, and humans, who often struggle to agree on what constitutes a good simplification. This paper focuses on the evaluation of English text simplifica tion using the crowd. We show that leveraging crowds can result in a collective decision that is accurate and converges to a consen sus rating. Our results from 2,500 crowd annotations show that the crowd can effectively rate levels of simplicity. This may allow sim plification systems and system builders to get better feedback about how well content is being simplified, as compared to standard mea sures which classify content into ‘simplified ’ or ‘not simplified’ categories. Our study provides evidence that the crowd could be used to evaluate English text simplification, as well as to create simplified text in future work

CiteSeerX

Crossref

Are decision trees a feasible knowledge representation to guide extraction of critical information from randomized controlled trial reports?

Author: A Aguirre-Junco
A Geissbuhler
A Keech
A Taddio
Ad Hoc working group for Critical Appraisal of the Medical Literature
AD Oxman
C Orasan
CD Mulrow
D Demner-Fushman
DG Altman
DG Covell
DL Sackett
DM D'Alessandro
E Coiera
E Coiera
Enrico Coiera
F Salager-Meyer
G Georg
Grace Y Chung
GY Cheng
HS Sacks
I Sim
J Cohen
J Hartley
J Swales
JJ Cimino
JW Ely
JW Ely
K Fozi
KA L'Abbe
L McKnight
M Clarke
M Clarke
M Dawes
M Fiszman
M Hunink
MC Weinstein
MH Ebell
ML Chambliss
MY Tsay
N Elhadad
NC Ide
PJ Devereaux
R Xu
RB Haynes
RL Kane
S Teufel
SP Balasubramanian
W Hersh
WS Richardson
Y Niu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background This paper proposes the use of decision trees as the basis for automatically extracting information from published randomized controlled trial (RCT) reports. An exploratory analysis of RCT abstracts is undertaken to investigate the feasibility of using decision trees as a semantic structure. Quality-of-paper measures are also examined. Methods A subset of 455 abstracts (randomly selected from a set of 7620 retrieved from Medline from 1998 – 2006) are examined for the quality of RCT reporting, the identifiability of RCTs from abstracts, and the completeness and complexity of RCT abstracts with respect to key decision tree elements. Abstracts were manually assigned to 6 sub-groups distinguishing whether they were primary RCTs versus other design types. For primary RCT studies, we analyzed and annotated the reporting of intervention comparison, population assignment and outcome values. To measure completeness, the frequencies by which complete intervention, population and outcome information are reported in abstracts were measured. A qualitative examination of the reporting language was conducted. Results Decision tree elements are manually identifiable in the majority of primary RCT abstracts. 73.8% of a random subset was primary studies with a single population assigned to two or more interventions. 68% of these primary RCT abstracts were structured. 63% contained pharmaceutical interventions. 84% reported the total number of study subjects. In a subset of 21 abstracts examined, 71% reported numerical outcome values. Conclusion The manual identifiability of decision tree elements in the abstract suggests that decision trees could be a suitable construct to guide machine summarisation of RCTs. The presence of decision tree elements could also act as an indicator for RCT report quality in terms of completeness and uniformity.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Macquarie University ResearchOnline

Comparing pronoun resolution algorithms

Author: Aone C.
Bagga A.
Bagga A.
Baldwin B.
Barbu C.
Barbu C.
Barbu C.
Barbu C.
Bean D.
Biber D.
Boyd A.
Charniak E.
Dagan I.
Evans R.
Ferrandez A.
Gaizauskas R.
Ge N.
Kennedy C.
Lappin S.
Luo X.
McCord M.
McCord M.
Mitkov R.
Mitkov R.
Mitkov R.
Mitkov R.
Mitkov R.
Mitkov R.
Muller C.
Muller C.
Ng V.
Orasan C.
Orasan C.
Preiss J.
Preiss J.
Preiss J.
Saiz-Noeda M.
Schiehlen M.
Strube M.
Stuckardt R.
Stuckardt R.
Stuckardt R.
Tanev H.
Tapanainen P.
Tetreault J.
Trouilleux F.
Tutin A.
Vilain M.
Publication venue: 'Wiley'
Publication date: 01/05/2007
Field of study

This paper discusses the comparative evaluation of five well-known pronoun resolution algorithms conducted with the help of a purpose-built tool for consistent evaluation in anaphora resolution, termed the evaluation workbench. The workbench enables the evaluation and comparison of pronoun resolution algorithms on the basis of the same preprocessing tools and test data. The tool is controlled by the user who can conduct the evaluation according to a variety of parameters, with regard to the types of anaphors and the samples used for evaluation. The extensive comparative evaluation of the pronoun resolution algorithms showed that their performance was significantly lower than the figures reported in the original papers describing the algorithms. The evaluation study concluded that the main reason for this drop in performance is the fact that all algorithms operate in a fully automatic mode

Crossref

Open Research Online (The Open University)

The QALL-ME Framework: A Specifiable-Domain Multilingual Question Answering Architecture.

Author: B. Magnini
C. Orasan
C. Spurk
D. Tomas
G. Neumann
I. Dornescu
J. L. Vicedo
M. Kouylekov
M. Negri
O. Ferrandez
R. Izquierdo
S. Ferrandez
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents the QALL-ME Framework, a reusable architecture for building multilingual Question Answering (QA) systems working on structured data. The framework is released as free open source software with a set of demo components and extensive documentation. As main characteristics of the QALL-ME Framework we point out: (i) the framework domain portability, achieved by an ontology modelling of the target domain; (ii) the context awareness regarding space and time of the question; (iii) the use of textual entailment engines as the core of the question interpretation; and (iv) the framework’s Service Oriented Architecture, which is realized using interchangeable web services. Furthermore we present a running example to clarify how the framework processes questions as well as a case study that successfully shows a QA application built with the QALL-ME Framework for cinema/movie events in the tourism domain

Archivio della ricerca - Fondazione Bruno Kessler

The Groningen Meaning Bank

Author: A Copestake
A Rosenbach
A Stefanowitsch
C Orasan
C Potts
F Pedregosa
G Minnen
J Bos
J Bos
J Carletta
J Hockenmaier
K Kipper
M Palmer
MP Marcus
N Asher
RA Sandt Van der
Publication venue: Springer
Publication date: 01/01/2017
Field of study

Crossref

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

SkipCor: Skip-Mention Coreference Resolution Using Linear-Chain Conditional Random Fields

Author: C Orasan
DC Wimalasuriya
E Fosler-Lussier
GA Miller
Lovro Šubelj
Marko Bajec
N Nguyen
Neil R. Smalheiser
S Huang
S Sarawagi
Slavko Žitnik
WM Soon
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Sentence retrieval for abstracts of randomized controlled trials

Author: A Keech
A McCallum
AD Oxman
B Settle
C Orasan
C Sporleder
C Sutton
D Demner-Fushman
D Marcu
D Moher
DG Covell
DL Sackett
DM D'Alessandro
F Salanger-Meyer
Grace Y Chung
GY Chung
I Sim
I Tbahriti
J Lafferty
J Lin
J Swales
JW Ely
K Hirohata
L McKnight
M Dawes
M Shimbo
MY Tsay
P Ruch
R P
R Xu
R Xu
W Mann
WS Richardson
Y Tsuruoka
Y Yamamoto
Publication venue: BMC
Publication date: 01/02/2009
Field of study

Abstract Background The practice of evidence-based medicine (EBM) requires clinicians to integrate their expertise with the latest scientific research. But this is becoming increasingly difficult with the growing numbers of published articles. There is a clear need for better tools to improve clinician's ability to search the primary literature. Randomized clinical trials (RCTs) are the most reliable source of evidence documenting the efficacy of treatment options. This paper describes the retrieval of key sentences from abstracts of RCTs as a step towards helping users find relevant facts about the experimental design of clinical studies. Method Using Conditional Random Fields (CRFs), a popular and successful method for natural language processing problems, sentences referring to Intervention, Participants and Outcome Measures are automatically categorized. This is done by extending a previous approach for labeling sentences in an abstract for general categories associated with scientific argumentation or rhetorical roles: Aim, Method, Results and Conclusion. Methods are tested on several corpora of RCT abstracts. First structured abstracts with headings specifically indicating <it>Intervention</it>, <it>Participant </it>and <it>Outcome Measures </it>are used. Also a manually annotated corpus of structured and unstructured abstracts is prepared for testing a classifier that identifies sentences belonging to each category. Results Using CRFs, sentences can be labeled for the four rhetorical roles with <it>F</it>-scores from 0.93–0.98. This outperforms the use of Support Vector Machines. Furthermore, sentences can be automatically labeled for <it>Intervention</it>, <it>Participant </it>and <it>Outcome Measures</it>, in unstructured and structured abstracts where the section headings do not specifically indicate these three topics. <it>F</it>-scores of up to 0.83 and 0.84 are obtained for <it>Intervention </it>and <it>Outcome Measure </it>sentences. Conclusion Results indicate that some of the methodological elements of RCTs are identifiable at the sentence level in both structured and unstructured abstract reports. This is promising in that sentences labeled automatically could potentially form concise summaries, assist in information retrieval and finer-grained extraction.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals